Midterm Project: Strawberry and Chemicals

Mi Zhang, Peng Liu, Yuanming Leng, Qiannan Shen

Data Cleaning for Strawberry

  • remove empty/missing values and reduce white space in the cells
  • split column with multiple items to separated columns
  • redefine “MEASURED IN CWT” by multiplying by 100
  • make extreme large value more accessible by using log scale on “value”

Data wrangling for Strawberry and Pesticides

  • drop empty rows/columns, remove white space
  • rename colname of Pesticide to chemical in order to match the colname in strawberry data
  • use toupper() to capitalize all chemical names
  • use pivot_longer() to make all toxins and levels into longer columns
  • use inner_join() to wrangle Pesticide and Strawberry dataset

Measurement units and value

  • Our major analysis is base on the measurement units, and our main focus is on “MEASURE IN LB”
  • redefine value as production of strawberry
  • Let’s explore our shiny.app (https://lemony.shinyapps.io/ma615-midterm/)

Mouse-over map

  • California and Florida have higher total amount of annual strawberry production in pounds than other states.

Annual value of Strawberry in each state

  • Showing that California and Florida increasingly used all kinds of chemicals on strawberry in recent years.
plot1("MEASURED IN LB")

Questions

  • Which toxin has higher strawberry production value?
  • Which type of chemical is commonly related to toxicity?

Toxin level changes over years

  • Bee toxins are related to larger strawberry production values.

Further analysis for Florida

  • Florida shows that insecticide has higher proportion.
p4 <- plot4("MEASURED IN LB", "FLORIDA")
ggplotly(p4, tooltip="y")

Bee Toxin

  • But looking solely at bee toxins, insecticide chemicals have higher proportion in strawberry production value in California.
p5 <- plot5("MEASURED IN LB","Bee.Toxins","CALIFORNIA")
ggplotly(p5, tooltip="y")

Further Analysis for Florida

  • Confirmed that insecticide is more commonly related to toxicity
p5 <- plot5("MEASURED IN LB","Bee.Toxins","FLORIDA")
ggplotly(p5, tooltip="y")

Limitations

  • Missing values
  • Data size shrank after wrangling (not able to match all chemicals)
  • We don’t know how chemical usages are related to strawberry production
  • We did not characterize toxin levels into numeric levels because they are in natural language and people have different perspectives defining them.

Conclusion

  • California and Florida have higher total strawberry production and more data are collected from those two states according to shiny display.
  • Bee toxins are related to higher strawberry production values than other types of toxins.
  • Insecticide is more commonly related to toxicity than fungicide in strawberry production.

Thanks

  • Professor Haviland
  • TA Bruce
  • Our lovely MA-615 Classmates
  • Our teammates

Citations